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Abstract: Xinjiang Uygur Autonomous Region is a typical inland arid area in China with a sparse and 
uneven distribution of meteorological stations, limited access to precipitation data, and significant water 
scarcity. Evaluating and integrating precipitation datasets from different sources to accurately characterize 
precipitation patterns has become a challenge to provide more accurate and alternative precipitation 
information for the region, which can even improve the performance of hydrological modelling. This 
study evaluated the applicability of widely used five satellite-based precipitation products (Climate 
Hazards Group InfraRed Precipitation with Station (CHIRPS), China Meteorological Forcing Dataset 
(CMFD), Climate Prediction Center morphing method (CMORPH), Precipitation Estimation from 
Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR), 
and Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis (TMPA)) and a reanalysis 
precipitation dataset (ECMWF Reanalysis v5-Land Dataset (ERA5-Land)) in Xinjiang using ground-based 
observational precipitation data from a limited number of meteorological stations. Based on this 
assessment, we proposed a framework that integrated different precipitation datasets with varying spatial 
resolutions using a dynamic Bayesian model averaging (DBMA) approach, the expectation-maximization 
method, and the ordinary Kriging interpolation method. The daily precipitation data merged using the 
DBMA approach exhibited distinct spatiotemporal variability, with an outstanding performance, as 
indicated by low root mean square error (RMSE=1.40 mm/d) and high Person's correlation coefficient 
(CC=0.67). Compared with the traditional simple model averaging (SMA) and individual product data, 
although the DBMA-fused precipitation data were slightly lower than the best precipitation product 
(CMFD), the overall performance of DBMA was more robust. The error analysis between DBMA-fused 
precipitation dataset and the more advanced Integrated Multi-satellite Retrievals for Global Precipitation 
Measurement Final (MERG-F) precipitation product, as well as hydrological simulations in the Ebinur Lake 
Basin, further demonstrated the superior performance of DBMA-fused precipitation dataset in the entire 
Xinjiang region. The proposed framework for solving the fusion problem of multi-source precipitation data 
with different spatial resolutions is feasible for application in inland arid areas, and aids in obtaining more 
accurate regional hydrological information and improving regional water resources management capabilities 
and meteorological research in these regions. 
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1 Introduction 


Precipitation plays a crucial role when examining worldwide shifts in climate patterns and 
intricate hydrological cycles (Ashouri et al., 2015; Rogelis and Werner, 2018; Kharaghani et al., 
2023), and accurate and stable precipitation estimates are essential for weather monitoring, 
hydrological simulation, and surveillance and prevention of natural disasters (Yin et al., 2021; 
Tadesse et al., 2022). As the most reliable means for recording precipitation at specific locations, 
traditional ground-based precipitation measurement techniques (such as rain gauges and radar 
precipitation estimation) provide direct precipitation observations (Ur Rahman et al., 2019). 
However, influenced by diverse factors, such as topography, climatic conditions, and installation 
and operation costs, ground-based observations have limited spatial coverage and sparse and 
uneven distribution in certain regions (Lanza and Stagi, 2008; Yang et al., 2020), which seriously 
limits the understanding of actual precipitation over a wide spatial extent (Shen and Xiong, 2016). 
For example, one rain gauge is deployed over an area of about 1.58104 km’ in Xinjiang Uygur 
Autonomous Region, China with alternating mountainous-desert terrain and obvious spatial and 
temporal variability in precipitation, which leads to weak spatial representativeness of rain 
gauges, while interpolated spatially continuous precipitation data suffer from large errors, shifted 
precipitation centers, and irrational precipitation distributions (Katipoglu, 2022). In recent years, 
with the advancement of remote sensing technology and reanalysis models, significant 
improvements have been made in addressing the spatiotemporal resolution challenges of 
traditional precipitation measurement networks, but there is regional variability among different 
precipitation datasets due to constraints imposed by data sources and algorithms (Gebregiorgis 
and Hossain, 2015; Mujfioz-Sabater et al., 2021). Therefore, evaluating and integrating 
precipitation datasets from different sources to accurately characterize and understand regional 
precipitation patterns has become a challenge, as this approach can provide more accurate and 
alternative precipitation information for hydrological modelling in regions lacking information 
and may improve the performance of hydrological modelling in arid areas of Xinjiang. 

To date, numerous precipitation products with high spatiotemporal resolution from both 
domestic and international sources have been made accessible to the public, including China 
Meteorological Forcing Dataset (CMFD) (He et al., 2020), Climate Prediction Center morphing 
method (CMORPH) (Joyce et al., 2004), Climate Hazards Group InfraRed Precipitation with 
Station (CHIRPS) (Funk et al., 2015), Tropical Rainfall Measuring Mission Multi-satellite 
Precipitation Analysis (TMPA) (Huffman et al., 2007), Precipitation Estimation from Remotely 
Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) 
(Hong et al., 2004), and ECMWF Reanalysis v5-Land Dataset (ERA5-Land) (Mufioz-Sabater et 
al., 2021). When constrained by various conditions, the global applicability of any precipitation 
product is limited, as its optimal performance is often confined to specific geographical regions. 
For instance, CHIRPS data show significant strengths and higher skill scores in daily, monthly, 
and yearly scale assessments in eastern Africa (Dinku et al., 2018), while TMPA exhibits better 
performance in recording extreme precipitation events in southern China (Wang et al., 2021). 
Moreover, differences in geographical location, elevation, and climatic conditions can result in 
spatial variations in precipitation products (Mosaffa et al., 2020; Bai et al., 2021; Wang et al., 
2022). Tan and Santo (2018) reported that PERSIANN-CDR is unable to capture all precipitation 
categories in Malaysia by underestimating low precipitation levels and overestimating medium to 
heavy precipitation levels, while in the Huaihe River Basin of China, the dataset exhibits limited 
performance in capturing variations in extreme precipitation events (Sun et al., 2021). As 
mentioned earlier, the regional and temporal uncertainty of a single precipitation product may 
affect the quality of precipitation estimates and further limit the accuracy of hydrological 
simulations. 

To address these questions, numerous researchers are dedicated to using various approaches to 
combine the diverse ground-based observations and satellite-based and reanalysis precipitation 
datasets to mitigate the uncertainties of individual members (Baez-Villanueva et al., 2020; Xu et 
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al., 2020; Yumnam et al., 2022). Research has indicated that data fusion integrates advantageous 
information from individual members to mitigate the limitations of singular data and exhibits 
superior performance compared to all or most independent precipitation estimates (Chen and 
Jahanshahi, 2018). The main precipitation data fusion approaches that are currently available 
include optimal interpolation (Wei et al., 2023b), data assimilation (Huffman et al., 2007), 
Bayesian model averaging (BMA) (Sloughter et al., 2007), and neural network analysis (Chen et 
al., 2018). Hoeting et al. (1998) proposed that linear combinations of models fail to correctly 
capture the strength of an individual model, while BMA approach can effectively balance the 
weights among multiple models and accurately consider uncertainty. The BMA approach 
integrates a predictive probability density function (PDF) from diverse models, with the 
predictive PDF of an individual variable represented as the weighted average of the posterior 
distribution of the model ensemble (Rings et al., 2012). To calculate the weights for each 
predictive variable, Raftery et al. (2005) utilized a machine learning algorithm 
(expectation-maximization) through iterative model parameter estimation to quantify the relative 
credibility of each model in interpreting the data, and applied it to the predictive PDFs of a 
developed meteorological forecast ensemble. In addition, Sloughter et al. (2007) used a single 
precipitation product to predict PDFs that exhibit a mixture of zero and gamma distributions, and 
successfully extended BMA to quantitatively forecast precipitation, leading to the applications of 
BMA to surface hydrology (Li and Tsai, 2009; Schéniger et al., 2014). Jiang et al. (2012) 
evaluated the applicability of three satellite-based precipitation datasets in the Mishui River 
Basin, China, by utilizing BMA to combine streamflow data from the Xin'anjiang model 
simulations and different precipitation products, which resulted in the merged streamflow 
exhibiting more robust performance. Ma et al. (2018) developed a dynamic Bayesian model 
averaging (DBMA) framework to optimize the relative weights of four satellite-based 
precipitation datasets over the Tibetan Plateau, China, confirming the superiority of DBMA 
ensemble over other conventional ensembles and individual members. Yumnam et al. (2022) 
presented the BMA approach that centered on different quantiles to train optimal weights for three 
precipitation products over the Vamsadhara River Basin along the Indian coast, demonstrating 
enhanced robustness in the performance of the fused precipitation dataset. The development of 
fusion algorithms is crucial for the utilization of precipitation integration schemes, and the choice 
of algorithms and input sources directly influences the performance of the fused precipitation 
dataset (Tang et al., 2020; Wei et al., 2023a). To effectively address the significant challenge of 
insufficient precipitation data in Xinjiang, we proposed a precipitation merging framework based 
on the DBMA approach and six satellite-based and reanalysis precipitation datasets over the 
region. 

This study focused on (1) a detailed comparison of six widely used domestic and international 
high-resolution satellite-based and reanalysis precipitation datasets (CMORPH, CHIRPS, CMFD, 
PERSIANN-CDR, TMPA, and ERA5-Land) with rain gauge ground-based observations; (2) the 
utilization of DBMA approach to optimize combined precipitation data from six different sources 
to produce an ensemble multi-source precipitation dataset on a daily scale over the 1999-2018 
time span to comprehensively evaluate its performance through a series of evaluation indicators 
and to compare it with the more advanced Integrated Multi-satellitE Retrievals for Global 
Precipitation Measurement Final (IMERG-F) product; and (3) a semi-distributed hydrological 
model to comprehensively assess the hydrological simulation performance of DBMA-fused 
precipitation and IMERG-F precipitation product in the Ebinur Lake Basin, Xinjiang. In this 
study, bilinear interpolation was employed to standardize the spatial resolution of multiple 
precipitation sources. The expectation-maximization algorithm was used to iterate the optimal 
weights of each fusion member at the ground meteorological stations. Finally, the point weights 
were spatially diffused to obtain continuous merged precipitation data via the ordinary Kriging 
interpolation method. The results of this research can effectively improve the accuracy of 
hydrological simulations in areas with missing precipitation data and enhance the effectiveness of 
regional water resources management in inland arid areas. 
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2 Study area and data sources 


2.1 Study area 


The Xinjiang Uygur Autonomous Region, China is located between 73°40'-96°18’E longitude 
and 34°25'-48°10'N latitude and is situated in the hinterland of the Eurasian Continent; its 
geographical area covers approximately one-sixth of the total land in China. Xinjiang has 
complex terrain characterized by an interplay of basins and mountain ranges, constituting a 
distinctive mountain-basin geomorphic system (Fig. 1). With the Altay Mountains on the northern 
boundary and the Kunlun Mountains on the southern boundary, the Tianshan Mountains in the 
middle are the natural geographical dividing line between the Junggar Basin and Tarim Basin. 
Xinjiang's topography exhibits significant differences in elevation and has obvious vertical 
climatic characteristics. The lowest elevation point occurs at the Aydingkol Lake (155 m below 
sea level) in the Turpan Basin, and the highest elevation point is at the Qogir Peak (8611 m above 
sea level). Although located in the northern temperate zone, Xinjiang is positioned far inland and 
distant from the sea, marking the culmination of atmospheric water vapour transport. Coupled 
with the obstruction of Indian Ocean moisture by the southern Tibetan Plateau, this leads to water 
vapour from the Atlantic Ocean becoming the primary moisture source for Xinjiang, as it is 
carried by the westerly circulation (Yin et al., 2017). In addition, the intercepting effect of high 
mountain ranges on westerly circulation and dry-cold moisture in the Arctic Ocean makes the arid 
mountainous areas of the region notable for their excess precipitation over the basin plains (Hu et 
al., 2021). 

The spatial distribution of precipitation in Xinjiang is extremely uneven, with an average 
annual precipitation of only 157.7 mm. The northern region of Xinjiang receives more than 80% 
of the total annual precipitation, while the southern region of Xinjiang only receives less than 
20%. Mountainous areas contribute 81% of Xinjiang's total annual precipitation, while plain 
regions (including desert and barren areas) receive only 19% (Zhang et al., 2022). As a typical 
inland river basin in Xinjiang, the Ebinur Lake Basin is characterized by low precipitation and 
high evaporation, and the basin's water resources are very sensitive to climate change. 
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Fig. 1 Overview of Xinjiang Uygur Autonomous Region (a) and the Ebinur Lake Basin (b) based on the digital 
elevation model (DEM). Note that the figures are based on the standard map (#f S(2023)064) of the Map Service 
System (https://xinjiang.tianditu.gov.cn/main/bzdt.html) marked by the Xinjiang Uygur Autonomous Region 
Platform for Common Geospatial Information Services, and the standard map has not been modified. 


2.2 Data 


2.2.1 Ground-based observational precipitation data 
The precipitation data from 105 ground meteorological stations distributed in Xinjiang used in this 
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study were sourced from the daily precipitation dataset of China's meteorological element 
observation sites available at the Resource and Environment Science and Data Center 
(https://www.resdc.cn/data.aspx? DATAID=230). This dataset comprises precipitation records from 
26 national benchmark meteorological stations, 40 national basic meteorological stations, and 39 
national general meteorological stations. All downloaded meteorological data underwent a sequence 
of quality control procedures, including formatting, internal consistency checks, elimination of 
duplicates, and identification and removal of anomalous data. The daily precipitation dataset refers 
to the accumulated precipitation within a 24-h period starting at 08:00 (Beijing time), which 
corresponds to Coordinated Universal Time (UTC) at 00:00. Therefore, the daily accumulated 
precipitation at the ground meteorological stations is temporally consistent with the precipitation 
product measured at 00:00 UTC. 

Based on the temporal overlap period of meteorological observations and different precipitation 
product datasets, we defined the period in this study as 1999-2018. Additionally, the seasons 
mentioned in the paper followed the universally accepted Gregorian calendar standard for division. 
Specifically, spring is defined as the period from March to May, summer from June to August, 
autumn from September to November, and winter from December to February of the next year. 
2.2.2 Satellite-based and reanalysis precipitation products 
This study reviewed various satellite-based precipitation products (including CMORPH, CHIRPS, 
CMFD, PERSIANN-CDR, and TMPA) and the reanalysis precipitation dataset (ERA5-Land) to 
address the limited ground-based observation challenges in accurately estimating precipitation in 
data-scarce regions. In addition, the IMERG-F precipitation product was integrated with Global 
Precipitation Measurement (GPM), Tropical Rainfall Measuring Mission (TRMM), and other 
satellite-based data generated with a delay of approximately 2.5 months based on the monthly 
precipitation analysis of the Global Precipitation Climatology Centre (GPCC) product (Hou et al., 
2014). This dataset provides high-quality precipitation estimates and is considered more suitable for 
scientific research (Wang et al., 2017). This paper compared the proposed method with 
DBMA-fused precipitation dataset in terms of independent site analysis and watershed simulation 
effectiveness. A further description of the precipitation products is provided in Table 1. 


Table 1 General information of the precipitation products used in this study 


Dail e o ange Period Website 

CHIRPS 0.05° Daily 50°S-50°N 198 1—present https://www.chc.ucsb.edu/data/chirps 

CMFD 0.10° Daily China (land only) 1979-2018 https://data.tpdc.ac.cn/zh-hans/data 
CMORPH 0.25° Daily 60°S-60°N 1998-present https://climatedataguide.ucar.edu/climate-data 

TMPA 0.25° Daily 50°S-50°N 1998-2019 https://gpm.nasa.gov/data 

PERSIANN-CDR 0.25° Daily 60°S-60°N 1983-present https://climatedataguide.ucar.edu/climate-data 

ERAS5-Land 0.10° Hourly Earth 1950-present https://cds.climate.copernicus.eu 
IMERG-F 0.10° Hourly Earth 2000-present https://gpm.nasa.gov/data/imerg 


Note: CHIRPS, Climate Hazards Group InfraRed Precipitation with Station; CMFD, China Meteorological Forcing Dataset; CMORPH, 
Climate Prediction Center morphing method; TMPA, Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis; 
PERSIANN-CDR, Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record; 
ERAS-Land, ECMWF Reanalysis v5-Land; IMERG-F, Integrated Multi-satellitE Retrievals for Global Precipitation Measurement Final. 


2.2.3 Other data sources 

The recorded daily streamflow data from 2001 to 2014 at the Wenquan station in the Ebinur Lake 
Basin were provided by the Water Resources Department of Xinjiang Uygur Autonomous Region, 
China. Daily-scale meteorological forcing data for the period 2001-2014, including longwave and 
shortwave radiation, temperature, barometric pressure, and wind speed, were obtained from the 
China Meteorological Data Service Centre (http://data.cma.cn/), which are interpolated by the 
ordinary Kriging interpolation method. The soil-cover data were obtained from the Harmonized 
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World Soil Database (HWSD) developed by the International Institute for Applied Systems Analysis 
(IIASA) and the Food and Agriculture Organization of the United Nations (FAO). This study 
utilized the 2023 updated version of the HWSD (HWSD v.2.0) (https://gaez.fao.org/pages/hwsd). 
Vegetation-cover data for 1998 were obtained from the global land cover database developed by the 
University of Maryland, USA. The classification system for the UMD land cover data was 
designed for the Simple Biosphere Model (SIB) and was divided into a total of 14 land cover 


types. 


3 Methods 


3.1 Framework 


Figure 2 shows the general process of the DBMA approach used to merge gridded precipitation 
data (including satellite-based and reanalysis precipitation datasets) with different spatial 
resolutions. The general process of merging gridded precipitation data used in this study was 
developed upon the algorithm introduced by Ma et al. (2018), which unifies gridded precipitation 
data with different spatial resolutions through a bilinear interpolation method, introduces a 
"discrete-continuous" hybrid model to describe the PDF of precipitation, and combines multiple 
gridded precipitation datasets using the DBMA approach. 
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Ground bas dobecrvationa Satellite-based and reanalysis precipitation products 


precipitation 
Crueles CMORPH | CHIRPS | CMFD | PERSIANN-) TMPA [ERA5-Land 
observations CDR 


Bilinear interpolation 
method 
(uniform spatial resolution) 


Training datasets 
(60 d, 3 a) 


Discrete-continuous model 


_v 
Log-likelihood function) 


Multi-source 
precipitation datasets 


Optimal grid DBMA 
weights 
(dynamic in space and time) 


d Yy | 
: Ordinary Kriging and 
EM algorithm normalization algorithms 


Fig. 2 Flowchart of the dynamic Bayesian model averaging (DBMA) approach for blending gridded 
precipitation data with different spatial resolutions. CHIRPS, Climate Hazards Group InfraRed Precipitation with 
Station; CMFD, China Meteorological Forcing Dataset; CMORPH, Climate Prediction Center morphing method; 
TMPA, Tropical Rainfall Measuring Mission Miulti-satellite Precipitation Analysis; PERSIANN-CDR, 
Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data 
Record; ERA5-Land, ECMWF Reanalysis v5-Land; EM, Expectation-Maximization. 


We established the estimation of model parameters based on training the model with simulated 
and observed data during the N days prior to model initialization. Previously, the training period 
used for weight estimation varied via DBMA approach. Sloughter et al. (2007) chose a 
month-long training duration to apply DBMA approach for 48 h forecasts over the North 
American Pacific region, while Ma et al. (2018) introduced a DBMA framework within a 40-d 
training timeframe to update the relative weights of four satellite-based precipitation datasets over 
the Tibetan Plateau region. Empirical evidence has shown that the training duration is strongly 
correlated with the specific conditions of the study area, and validation using actual data from the 
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study area is more reliable. 

To evaluate the model training length, we calculated the root mean square error (RMSE) and 
Pearson's correlation coefficient (CC) for the DBMA dataset from precipitation data of 105 
meteorological stations in Xinjiang in 2017, and compared these values across a range of 
potential training days (N=20, 25, 30, ..., 100). Figure 3 shows that the RMSE and CC initially 
exhibit downward and upward trends as the increase of training period length, respectively, and 
both of them reach their optimal values at 60 d. Beyond 60 d, the RMSE and CC demonstrate 
upward and downward trends, respectively, indicating deteriorating performance. 
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Fig. 3 Variations in the two indicators (RMSE and CC) between DBMA-based precipitation and ground-based 
observational precipitation data across different training period lengths. RMSE, root mean square error; CC, 
Pearson's correlation coefficient. 


3.2 BMA technique 


BMA is an ensemble technique in the fields of statistics and machine learning that combines 
forecasts and inferences from multiple competing models to generate more precise and 
dependable probabilistic combinations (Raftery et al., 2005). The BMA allocates weights to each 
model based on its posterior probability distribution during the training period, where the weight 
assigned is determined by how much each model quantifies the overall predictive likelihood. 

The BMA approach is briefly described as follows. Assuming that the fused precipitation 
variable is y and the number of models is k, the prediction fj of each ensemble member is 
associated with a conditional PDF h,(y|f;) (the posterior distribution of y based on the training 


data of predictor 7). Therefore, the PDF of fused precipitation variable y is calculated as follows: 


k 
POI fifo fd = wolf), (1) 


i=l 
where p(y| f;) is the forecast PDF of the BMA ensemble y given the observed data; and w; 


corresponds to the relative contribution of predictor i to the predicted variable during the training 
period, also known as the posterior probability of member 7. To ensure that the ensemble prediction 
constitutes a complete probability distribution during the same training period, the sum of the 
posterior probabilities of the members is equal to 1. 

Raftery et al. (2005) applied a deviation correction-centered normal distribution to fit the 
conditional PDF of variables, such as temperature and sea-level pressure. However, for cumulative 
precipitation distribution data, the value equals to zero in most cases, and for the part that does not 
equal to zero, the distribution is highly skewed; therefore, the normal distribution is not suitable for 
precipitation data. To extend BMA to precipitation variables, Sloughter et al. (2007) proposed a 
hybrid model based on discrete continuous component distributions for BMA modelling, where the 
conditional PDF of the ensemble members corresponds to a mixture of discrete components at zero 
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and gamma distributions. This model was successfully applied to the daily forecast of 24 h 
cumulative precipitation over the Pacific Northwest of North America through a mesoscale 
ensemble at the University of Washington, USA. 

In the standard BMA used by Raftery et al. (2005), the conditional PDF p(y| f;) (Eq. 1) is 
usually approximated using the normal distribution centered on a linear function, while in the mixed 
model with the "discrete-continuous" component distribution proposed by Sloughter et al. (2007), 
the conditional PDF for the precipitation variable is expressed as follows: 


k 
POl fis far» fe) = >, WLP = 91 FLY = Ol+ a 
fel d 


Po > 0l fi): o | AHLY > OU] 
where yo is the cube root of cumulative precipitation; and the general indicator function /[yo=0] or 
I[0>0] is unity if the condition in brackets holds, otherwise it is zero. The term p(y, =0|f;) is 


specified by Equation 3. The conditional PDF g;(yọ| f;) of precipitation amount yo given that it is 
positive is a gamma distribution with PDF. 


logitp(yo =0| f;) = ao; + a; f; + 4215), (3) 
Lae 
Zio lA) =z y exP(-Yo A), (4) 
0 BT (ay) 0 0 


where logit is a logit function; aoi, aii, and az; are parameters, which are determined by logistic 
regression using precipitation (or zero precipitation) as the dependent variable and f"? and ô; as 


the predictor variables; if f=0, 6; is equal to 1, otherwise it is equal to 0; and T is the gamma 
function. The parameters g,=,7/o07 and £=07/,y, of the gamma distribution are depended on 


fi through the following relationships: 
Hi = dy thy fi? (5) 


2 
Oi =cotaf;, (6) 
where 4; and øo? are the mean and the variance of the distribution, respectively. The parameters 


bo; and bı; are determined by linear regression, and the parameters co and cı are estimated by the 
maximum likelihood technique from the training data. 


3.3 Performance evaluation 


In this study, several common statistical indicators were utilized to quantitatively evaluate the 
performances of different satellite-based and reanalysis precipitation products and merged 
precipitation datasets in terms of precipitation estimation. Evaluation indicators of CC, RMSE, 
and relative bias (RB) were used to assess the consistency between the estimated and observed 
precipitation data. The mean absolute error (MAE), Kling-Gupta efficiency (KGE), and Theil's U 
statistic were utilized to measure the average error, similarity, and disparity, respectively, between 
the estimated and observed precipitation data. Additionally, the ability of the gridded 
precipitation datasets to detect precipitation was evaluated using the following three indicators: 
the probability of detection (POD) to measure the ability of precipitation product to detect the 
actual precipitation events, the false alarm rate (FAR) to represent the proportion of incorrectly 
predicted precipitation events according to the estimated precipitation, and the critical success 
index (CSI) to gauge the overall accuracy of the estimated precipitation in predicting 
precipitation. The optimal scores for POD and CSI are both 1.00, while the optimal score for 
FAR is 0.00; moving towards these optimal scores indicates more satisfactory estimation 
results. Furthermore, the Nash-Sutcliffe efficiency (NSE) was utilized to quantitatively assess 
the ability of the variable infiltration capacity (VIC) model to simulate streamflow. The formula 
and optimal values of the above indicators are shown in Table 2. 
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Table 2. Equations and optimal values of the evaluation indicators used in this study 


Evaluation indicator Equation Optimal value 


1 n 
Mean absolute error (MAE; mm/d) MAE = -5 \M -G 0 


Pearson's correlation coefficient (CC) CC = = = = = 1 
(Xm Ee -9) 
Root mean square error (RMSE; mm/d) RMSE = “> (m, - G,) 0 
n j=l 
> (M,-G, 
Relative bias (RB; %) RB=—4 x 100% 0 
G; 
jar 
f n 3 B Ty 
Theil's U (4) u=] (M, -6,79 MG 0 
fay a jal 
KGE =1—(CC-1)’ +(B-1)? +(y-1)” 
Kling-Gupta efficiency (KGE) vi D 1 
where 8 =M / G, y =CVy / CV, 
Critical success index (CST) CSI = ————_ 1 
H+T+F 
Probability of detection (POD) POD =——_ 1 
H+T 
F 
False alarm rate (FAR) FAR = —— 0 
H+F 
D (Siom Se) 
Nash-Sutcliffe efficiency (NSE) NSE =1 = 1 


D (Sa-a) 


Note: n is the number of samples; M; and G; are the estimated and observed precipitation data, respectively; M and G are the mean 
of estimated precipitation data and the mean of observed precipitation data, respectively; CV and CV are the coefficients of variation 
of estimated and observed precipitation data, respectively; £ is the mean ratio between estimated and observed values; yis the variance 
ratio between estimated and observed values; H is the number of observed precipitation events simultaneously monitored by the 
evaluated precipitation products; T is the number of observed precipitation events that were not monitored by the evaluated precipitation 
products; F is the number of "unobserved" precipitation events monitored by the evaluated precipitation products; Ssim and Sobs are the 
simulated and observed streamflow, respectively; and S, is the average of the observed streamflow values. 


To evaluate the developed DBMA dataset, we further used traditional ensemble method (simple 
model averaging (SMA)) for intercomparison. The SMA scheme assigns the same weight to all 
members, i.e., the arithmetic mean of all individuals, which is calculated as follows: 


1 m 
R=—)'0,, (7) 


where R (mm) is the mixed precipitation data using the SMA method; m is the number of 
precipitation products; and Q4 is the combined ensemble members. 


4 Results 


4.1 Accuracy evaluation of multi-source precipitation datasets 


Table 3 displays the comparison of daily precipitation between the six satellite-based and 
reanalysis precipitation products and ground-based observational precipitation data from 1999 to 
2018. The results indicated that CMFD exhibited the highest correlation with observed 
precipitation (CC=0.62), followed by ERAS5-Land (CC=0.55), and the remaining four 
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precipitation products exhibited comparatively lower accuracy (CCs between 0.25 and 0.28). The 
correlation of satellite-based and reanalysis precipitation products at meteorological stations was 
visually represented with greater precision in spatial presentation, where values in the northern 
region of the study area were generally higher than those in the southern region (Fig. 4). In terms 
of RB, the products of CMFD, CHIRPS, and TMPA slightly underestimated the actual 
precipitation, while the remaining three precipitation products had different degrees of 
overestimation (Table 3). The CMFD had the best agreement with real precipitation (RB= 
—0.11%), while ERA5-Land dataset strongly overestimated the daily precipitation in Xinjiang 
(RB=34.46%). The estimated daily precipitation for all the satellite-based and reanalysis 
precipitation datasets exhibited similar error sizes, ranging from 1.78 to 2.73 mm/d according to 
the RMSE statistical indicator. 


Table 3 Statistical evaluation of the daily precipitation for the six satellite-based and reanalysis precipitation 
products compared with the ground-based observational precipitation data during 1999-2018 


Statistic CHIRPS CMORPH CMFD PERSIANN-CDR TMPA ERAS-Land 
CC 0.27 0.25 0.62 0.28 0.28 0.55 
RMSE (mm/d) 2.58 2.73 1.78 2.16 2.38 2.00 
RB (%) —0.96 11.87 —0.11 16.07 —5.08 34.46 
POD 0.33 0.45 0.83 0.71 0.40 0.92 
FAR 0.67 0.62 0.58 0.70 0.61 0.72 
CSI 0.20 0.26 0.39 0.26 0.25 0.27 


Furthermore, this research assessed the accuracy of the six satellite-based and reanalysis 
precipitation products in terms of daily precipitation in various seasons (Table 4). Overall, the 
correlation between the satellite-based and reanalysis precipitation products and ground-based 
observational precipitation data in summer (CCs: 0.24—0.67) was better than those in autumn 
(CCs: 0.23—0.63), spring (CCs: 0.15—0.60), and winter (CCs: 0.01—0.64). The correlations and 
errors between CMFD precipitation product and observed precipitation in spring, summer, and 
autumn were optimal, with slight overestimation (or underestimation) of the actual daily 
precipitation, while CMFD precipitation product exhibited significant overestimation in winter 
(RB=9.94%). In addition, ERA5-Land exhibited the best correlation (CC=0.64) and lowest error 
(RMSE=0.86 mm/d) with the observed precipitation in winter, and the overestimation of the 
actual daily precipitation persisted throughout the year, with spring and autumn being the most 
severe. Notably, the CC of CMORPH was only 0.01 in winter, indicating a basic lack of 
correlation with ground-based observational precipitation and exhibiting the highest degree of 
underestimation (RB= —44.47%) in actual precipitation, which aligned with the finding of Guo et 
al. (2017) that CMORPH was unsuitable as an alternative dataset for regions covered by ice and 
snow. The RMSE indicator of estimated precipitation based on the satellite-based and reanalysis 
precipitation products exhibited distinct seasonal variations in the level of dispersion between 
estimated and observed data. The average RMSE values for all products across different seasons 
ranking from the highest to lowest were 3.04 mm/d in summer, 2.32 mm/d in spring, 1.97 mm/d 
in autumn, and 1.35 mm/d in winter. 

To accurately assess the performance of daily precipitation data estimated from different 
datasets for precipitation detection, we compiled the POD, FAR, and CSI for different 
precipitation datasets in this paper. Table 4 showed that CMFD had the best performance in 
detecting precipitation and non-precipitation days, with a CSI value of 0.39, and CHIRPS had the 
worst detection capability (CSI=0.20). The spatial distribution of CSI metric (Fig. 4) showed a 
clear dividing line in the Tianshan Mountains, where the southern region of the study area 
exhibited lower CSI values than the northern region. In addition, the spatial distribution of POD 
at the meteorological stations clearly showed that the values of ERAS-Land, CMFD, and 
PERSIANN-CDR were significantly greater than those of the remaining three precipitation 
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Fig. 4 Spatial distributions of the statistical indicators (CC, POD, and CSI) for CHIRPS (al-a3), CMFD (b1-b3), 
CMORPH (cl-c3), PERSIANN-CDR (d1-—d3), TMPA (el—e3), and ERAS-Land (f1-—f3) precipitation products at 
105 meteorological stations in Xinjiang during 1999-2018. POD, probability of detection; CSI, critical success 
index; CHIRPS, Climate Hazards Group InfraRed Precipitation with Station; CMFD, China Meteorological Forcing 
Dataset; CMORPH, Climate Prediction Center morphing method; PERSIANN-CDR, Precipitation Estimation from 
Remotely Sensed Information using Artificial Neural Networks-Climate Data Record; TMPA, Tropical Rainfall 
Measuring Mission Multi-satellite Precipitation Analysis; ERA5-Land, ECMWF Reanalysis v5-Land. 
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Table 4 Statistical evaluation of the six satellite-based and reanalysis precipitation products compared with 
observed precipitation data in different seasons during 1999-2018 


Season Statistic CHIRPS CMORPH CMFD PERSIANN-CDR TMPA ERAS5-Land 
CC 0.31 0.15 0.60 0.29 0.23 0.57 
RMSE (mm/d) 2.36 3.18 1.80 2.12 2.46 2.03 
RB (%) 4.50 25.13 -3.51 4.81 -17.21 48.36 

Spring 
POD 0.45 0.46 0.81 0.77 0.38 0.91 
FAR 0.70 0.67 0.59 0.73 0.62 0.74 
CSI 0.22 0.24 0.37 0.25 0.23 0.26 
CC 0.24 0.36 0.67 0.25 0.37 0.50 
RMSE (mm/d) 3.82 3.27 2:27 3.04 2.97 2.85 
RB (%) 5,59 20.09 -2.28 18.39 2.67 19.03 

Summer 
POD 0.21 0.72 0.87 0.76 0.63 0.93 
FAR 0.45 0.55 0.57 0.63 0.54 0.66 
CSI 0.18 0.38 0.40 0.33 0.36 0.33 
CC 0.27 0.23 0.63 0.30 0.24 0.58 
RMSE (mm/d) 2.25 2.33 1.54 1.87 2.11 1.74 
RB (%) 2.68 10.58 2.83 18.57 —16.48 50.48 

Autumn 
POD 0.31 0.44 0.80 0.71 0.34 0.89 
FAR 0.63 0.67 0.61 0.71 0.65 0.75 
CSI 0.20 0.23 0.36 0.26 0.21 0.24 
CC 0.31 0.01 0.40 0.27 0.13 0.64 
RMSE (mm/d) 1.07 1.87 1.35 1.12 1.82 0.86 
RB (%) -3.17 —44.47 9.94 28.55 14.64 29.90 

Winter 
POD 0.38 0.02 0.80 0.57 0.14 0.92 
FAR 0.75 0.90 0.52 0.76 0.75 0.75 
CSI 0.18 0.02 0.43 0.20 0.10 0.25 


product, and the spatial distribution of POD values was relatively uniform, with no high-value 
concentration areas. FAR analysis implied that CMFD could detect 58% of the erroneous 
precipitation events, and the FARs of the remaining products were all higher than 0.60, with the 
highest value occurring in the ERA5-Land dataset, where 72% of the precipitation events were 
unobservable by the rain gauges (Table 3). Table 4 reveals that in summer, the best precipitation 
event occurrence for all precipitation products is approximately 0.33 on average, with CMFD and 
CMORPH having 0.40 and 0.38, respectively. On the other hand, winter is the season with the 
lowest occurrence of precipitation events (indicated by CSI) and the highest estimation error 
(indicated by FAR), with averages of 0.20 and 0.74, respectively. The best performer in winter 
was CMFD, with CSI and FAR values of 0.43 and 0.52, respectively, while CMORPH performed 
the worst in this season, with an FAR of 0.90 and a CSI of only 0.02. 


4.2 Distribution model of DBMA-fused precipitation weights 


Figure 5 shows the distribution of the average relative weights of the combined members of the 
DBMA-fused precipitation during 1999-2018. The CMFD and ERAS5-Land members were 
assigned higher relative weights, with average values of 0.57 and 0.38, respectively. The 
remaining members exhibited lower relative weight assignments, with CMORPH having an 
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average relative weight value of 0.02, while CHIRPS, PERSIANN-CDR, and TMPA maintained 
the average relative weights of approximately 0.01. Differential weight allocations among distinct 
members were most obvious in terms of seasonal variations. The combined CMFD and 
ERAS5-Land members exhibited comparable relative weight distributions during the winter and 
spring seasons but greater disparities during the summer and autumn seasons. The most 
pronounced discrepancy occurred around the end of the summer season, during which time the 
CMORPH member reached its peak in multi-year average relative weight. The relative weight 
allocations for the other combined members showed minimal fluctuations throughout the year, 
generally remaining within the range of 0.00—0.02. 
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Fig. 5 Distributions of the multi-year monthly average relative weights of the six satellited-based and reanalysis 
precipitation products (CHIRPS, CMFD, CMORPH, PERSIANN-CDR, TMPA, and ERA5-Land) during 


1999-2018 


Figure 6 exhibits the average relative weights of the six satellited-based and reanalysis 
precipitation products in different seasons during 1999-2018, facilitating the analysis of the 
seasonal weight variations in the combined members. Overall, the proportions of average relative 
weights assigned to different members by DBMA were generally consistent in different seasons. 
Specifically, CMFD and REA5-Land received higher average relative weights; the sum of which 
was as high as approximately 0.90, while the total relative weight distribution of the remaining 
members was approximately 0.10. In addition, varying average relative weights among different 
members were evident in different seasons. 

In spring and winter, CMFD exhibited a relatively uniform distribution of average relative 
weights, ranging from 0.02 to 0.90, with a mean of approximately 0.50. The average relative 
weight distribution of ERA5-Land ranged from 0.10 to 0.95, with a mean of approximately 
0.40. The average relative weight distributions of the other members generally remained below 
0.20, with a mean of approximately 0.02. In summer and autumn, the average relative weight 
distribution of CMFD ranged from 0.10 to 0.90, with a mean of approximately 0.60, while that 
of REAS-Land varied from 0.02 to 0.80, with a mean of approximately 0.30. In comparison to 
the spring and winter seasons, DBMA assigned higher average relative weights to the member 
CMFD in summer and autumn, and REA5S-Land had a comparatively weaker performance. In 
addition, CHIRPS received extremely low average relative weight allocations during the 
summer and autumn seasons, with a mean of approximately 0.01, rendering it essentially 
negligible. Results of CMORPH indicated that some rain gauge stations had an average relative 
weight distribution between 0.20 and 0.40, with a mean of approximately 0.06 in summer and 
autumn, which implied a significant improvement in performance compared with those in the 
other two seasons. 
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Fig. 6 Average relative weights of the six satellited-based and reanalysis precipitation products (CHIRPS, 
CMFD, CMORPH, PERSIANN-CDR, TMPA, and ERA5-Land) used for DBMA calculations in different seasons 
during 1999-2018. (a), spring; (b), summer; (c), autumn; (d), winter. Dots indicate data points of average 
relative weights. Box boundaries indicate the 25'" and 75" percentiles, and whiskers below and above the box 
indicate the 10" and 90" percentiles, respectively. The black horizontal line within each box indicates the 
median of data points. 


4.3 Statistical evaluation and comparison of DBMA-fused precipitation data 


Figure 7 represents the spatial distribution of evaluation indicators for DBMA-fused precipitation 
at the grid scale in Xinjiang, with all assessments ensuring the presence of at least one rain gauge 
station within the grid scale. Overall, the performance of DBMA-fused precipitation data was 
satisfactory compared with the ground-based observational daily precipitation data in Xinjiang, 
with an overall CC of 0.67 (Fig. 7b); additionally, the areas with higher CC values were located 
around the Tianshan Mountains, Altay Mountains, and Kunlun Mountains, where alpine melting 
ice and high mountain ranges created favorable conditions for precipitation in the surrounding 
areas. The RMSE of DBMA-fused precipitation data decreased sequentially from the Tianshan 
Mountains to the northern and southern regions of Xinjiang, ranging from 4.20 to 0.34 mm/d, 
with a mean value of 1.40 mm/d for the whole Xinjiang region (Fig. 7a). According to the RB 
indicator, the mean value in Xinjiang was 21.5%, and the overestimation of precipitation was 
mainly distributed around the Tianshan Mountains and the southwestern region of Xinjiang, 
where precipitation was more abundant than in other regions; moreover, the underestimation 
mainly occurred in the southeastern and northern regions of Xinjiang, most of which were located 
around the two deserts: Taklimakan Desert and Kumtag Desert (Fig. 7c). Finally, DBMA-fused 
precipitation exhibited higher POD with a mean value of 0.92 (Fig. 7d), and its spatial distribution 
increased gradually from the southeast to northwest of Xinjiang, with a particular advantage in 
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the Tianshan Mountains, indicating that DBMA approach contributed to enhancing the spatial 
detecting capability of precipitation. Based on the aforementioned statistical indicators, the 
quality of DBMA-fused precipitation data in Xinjiang was basically acceptable at the daily scale 
during 1999-2018. 
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Fig. 7 Spatial distributions of RMSE (a), CC (b), RB (c), and POD (d) of DBMA-fused precipitation data at 105 
meteorological stations in Xinjiang during 1999-2018. RB, relative bias. 


To further quantify the performance of DBMA in Xinjiang, we selected six evaluation 
indicators to compare and assess the daily precipitation from two ensembles (DBMA-fused 
precipitation and SMA-based precipitation) and six satellite-based and reanalysis precipitation 
products at 105 meteorological stations (Table 5). The ensemble SMA has an advantage over the 
other merged members except for CMFD, especially in the POD, mainly because SMA assigns 
the same weight to the merged members and the ensemble apparently captures more precipitation 
events than the individual merged members. More importantly, DBMA significantly outperforms 
SMA and most individual merged members. Compared with the best merged member (i.e., 
CMFD), DBMA yielded improvement rates of 2.17%, 7.46%, 21.35%, 9.78%, and 6.45% in 
terms of MAE, CC, RMSE, POD, and Theil's U, respectively. The rates of improvement of 
DBMA compared with the poorest performing members of the indicators were 41.56%, 59.70%, 
48.72%, 64.13%, and 45.96%, respectively, for MAE, CC, RMSE, POD, and Theil's U. Notably, 
the KGE score of DBMA was 0.56, which was slightly lower than that of the best member 
(CMFD); however, DBMA had significant advantages over other individual merged members. 
Thus, in contrast to the individual merged members and the ensemble SMA, DBMA showed 
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better overall performance in terms of evaluation indicators. The results emphasized the 
practicability of utilizing DBMA to fuse satellite-based and reanalysis precipitation datasets in the 
study area. 


Table 5 Evaluation indicators of daily precipitation from two ensembles (DBMA-fused precipitation and 
SMA-based precipitation) and six satellite-based and reanalysis precipitation products at 105 meteorological 
stations in Xinjiang 


Ensemble/member MAE (mm/d) CC RMSE (mm/d) POD KGE score Theil's U 
DBMA 0.45” 0.67" 1.40” 0.92 0.56 0.87” 
SMA 0.59 0.55 1.80 0.97" 0.35 1.30 
CHIRPS 0.72 0.27 2.58 0.33 0.27 1.19 
CMORPH 0.77 0.25 2.73 0.45 0.24 1.16 
CMFD 0.46 0.62 1.78 0.83 0.59% 0.93 
PERSIANN-CDR 0.76 0.28 2.16 0.71 0.10 1.61 
TMPA 0.70 0.28 2.38 0.40 0.27 1.29 
ERAS5-Land 0.62 0.55 2.00 0.92 0.36 0.93 


Note: DBMA, dynamic Bayesian model averaging; SMA, simple model averaging. * represents the best score for the compared 
ensembles and members. 


According to the spatial patterns of multi-year average precipitation from the eight precipitation 
datasets (two ensembles and six satellite-based and reanalysis precipitation products) during the 
period 1999-2018 (Fig. 8), the overall trend of multi-year average precipitation decreased from 
the northwest to southeast of Xinjiang. The areas with lower precipitation were located in the 
southern region of Xinjiang and were occupied by the Taklimakan Desert, while relatively 
abundant precipitation was found in the higher altitude regions of the Tianshan Mountains, Altay 
Mountains, and the narrow central part of the Kunlun Mountains. These findings also indicated 
that the geographical distribution characteristics of precipitation in Xinjiang were closely related 
to altitude. Nevertheless, different datasets exhibited pronounced disparities in the spatial pattern 
of multi-year average precipitation. For CHIRPS dataset, the precipitation-rich regions are 
concentrated in the foothills of the Tianshan Mountains and the Ili River Valley, where 
precipitation reaches approximately 800-1000 mm, with some areas on the northern slopes of the 
Tianshan Mountains as high as 1200 mm, which is severely overestimated in some regions. For 
CMORPH and TMPA products, the spatial patterns of multi-year average precipitation lacked 
continuity, with inherent product characteristics resulting in the significant presence of outliers in 
regions with water bodies (Tang et al., 2016). In comparison to the latter, the former provided a 
more abundant estimate of multi-year average precipitation in the Kunlun Mountains and a worse 
estimate of multi-year average precipitation in the Ili River Valley. The PERSIANN-CDR dataset 
contained the highest multi-year average precipitation in the western region of Xinjiang, with a 
gradual decrease from the west to east in a circular pattern, which did not align with the actual 
conditions. The multi-year average precipitation distributions of CMFD, ERA5-Land, SMA, and 
DBMA all exhibited the spatial patterns of "high in the north and low in the south", with higher 
precipitation levels occurring near the high-altitude Tianshan Mountains, Altay Mountains, and 
Kunlun Mountains. The overall low precipitation area of the Tarim Basin extending eastwards to 
the Turpan-Hami Basin was basically consistent with the actual precipitation profile of Xinjiang. 
However, ERA5-Land product significantly overestimated multi-year average precipitation on the 
northern and southern slopes of the Tianshan Mountains and the Ili River Valley, reaching up to 
approximately 1700 mm in some regions, indicating substantial deviation from actual precipitation. 


4.4 Comparison of DBMA-fused precipitation dataset with the latest IMERG-F 
precipitation product 


Error analysis of 105 meteorological stations revealed that DBMA outperformed IMERG-F for all 
the evaluation indicators in Xinjiang (Table 6). Specifically, DBMA-fused precipitation dataset 


XU Wenjie et al.: Improving the accuracy of precipitation estimates in a typical... 347 


(a) CHIRPS 


Approved: 
#tS(2023)064 


Multi-year average precipitation (mm) Multi-year average precipitation (mm) 


OSH HD DH GD DP OC OL o GY GH GB DB DB HB O YO DP WO HD 
SHEL HSK SFHF SESE PGES Gy PMP GP AP MPM HP GP MAY Gd 


(c) CMORPH (d) PERSIANN-CDR 


Multi-year average precipitation (mm) Multi-year average precipitation (mm) 


S OD HP HB VO OO A YP HM 
SPF WP GCP PF PSP FOP SS HO GSP PHM GS Cw 


(f) ERAS-Land 


Multi-year average precipitation (mm) 
| 


N A OD GPG HO O BD AH OP O & 
SHS FF HHH W GW Fs 
(h) DBMA 


S HD HP O_O PDP DP O O 
SP HW LSE PH Fw © Eh 
(g) SMA 


Multi-year average precipitation (mm) 

a E 
S GS GB GB GD YP PB O DO HO HD S $ y QP e o LH DPD LA 
PBS Fh oF PP WH FP -Cak oS SHELLS FHS SES PGS Gs 


Multi-year average precipitation (mm) 


Fig. 8 Spatial distributions of multi-year average precipitation in Xinjiang during 1999-2018 based on CHIRPS 
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Table 6 Daily-scale evaluation indicators of DBMA-fused precipitation dataset and the latest IMERG-F product 
at 105 meteorological stations in Xinjiang during 2001-2014 


Dataset MAE (mm/d) CC RMSE (mm/d) POD KGE score Theil's U 
DBMA 0.55” 0.65” 1.79 0.87” 0.54" 0.93* 
IMERG-F 0.71 0.42 2.28 0.68 0.38 1.16 


Note: * represents the best score for the compared datasets. 


showed lower MAE, RMSE, and Theil's U values, with improvements of 22.97%, 21.52%, and 
19.90%, respectively, over the IMERG-F precipitation product. The CC and KGE indicators of 
DBMA-fused precipitation dataset were the most satisfactory, with improvements of 34.86% and 
29.20%, respectively, compared with the IMERG-F precipitation product. In addition, DBMA 
significantly improved the actual precipitation event detection, with a POD of 0.87. Preliminary 
comparisons with this product indicated that the accuracy of the developed DBMA-fused 
precipitation dataset was acceptable for a multi-source satellite precipitation product in Xinjiang. 

For a more comprehensive evaluation of the hydrological utility of DBMA-fused precipitation 
dataset in the Ebinur Lake Basin, Xinjiang, this research utilized two distinct precipitation 
datasets (DBMA-fused precipitation dataset and IMERG-F precipitation product) as control 
variables for streamflow simulation via the VIC model. Figure 9 shows a comparison of the 
reproduced monthly streamflow based on these two precipitation datasets for hydrological 
simulations at the Wenquan station in the Ebinur Lake Basin during 2001-2014. The results 
showed that the VIC model-simulated streamflow driven by the two precipitation datasets 
followed the same trend as the observed streamflow and was able to capture most of the 
streamflow peaks. DBMA-fused precipitation simulated streamflow better in summer than in 
winter, when actual streamflow tended to be overestimated and fluctuated erratically. The 
IMERG-F-simulated streamflow was greater than the observed streamflow throughout the study 
period (2001-2014), with more pronounced fluctuations occurring in winter. Compared with the 
IMERG-F-simulated streamflow, the DBMA-fused precipitation exhibited more stable and 
accurate simulations overall, with a higher NSE value (NSE=0.68). 
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Fig. 9 Comparison of simulated streamflow from the VIC model driven by DBMA-fused precipitation dataset 
(a) and IMERG-F precipitation product (b) with observed streamflow from the Wenquan hydrological station in 
the Ebinur Lake Basin from 2001 to 2014. VIC, Variable Infiltration Capacity, IMERG-F, Integrated 
Multi-satellitE Retrievals for Global Precipitation Measurement Final; NSE, Nash-Sutcliffe efficiency. 
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5 Discussion 


5.1 Rationality analysis 


Satellite-based and reanalysis precipitation estimates are particularly important alternative 
solutions for hydrological applications in ungauged areas and regions with restricted availability 
of precipitation information (Zambrano et al., 2017). In this paper, six domestic and foreign 
satellite-based and reanalysis precipitation datasets were selected, and their applicability in 
Xinjiang, China, was comprehensively evaluated using traditional evaluation methods based on 
ground-based observational precipitation data. On the basis of these data, the DBMA approach 
was used as the core to establish the basic framework for the fusion of multi-source precipitation 
data. Site validation and assessment of hydrological model-simulated streamflow demonstrated 
that employing the DBMA approach to fuse individual satellite-based precipitation estimates is 
reliable in the arid regions of Northwest China. 

Comprehensive assessment of the six satellite-based and reanalysis precipitation datasets based 
on the ground-based observational precipitation data revealed that three high-resolution datasets 
(CMFD, ERA5-Land, and CHIRPS) provide more accurate representations of the spatial 
distribution of precipitation in Xinjiang (Hu et al., 2021). Unlike the strong spatial dependence of 
CHIRPS in the upper and lower reaches of the Indus River Basin, India (Shahid et al., 2021), the 
spatial dependence of CHIRPS data in Xinjiang of China is limited, and the precipitation 
distribution is consistent with the actual situation. However, ERA5-Land tends to overestimate 
the actual precipitation on the northern slopes of the Tianshan Mountains and the Ili River Valley. 
Chen et al. (2021) reported that ERA5-Land precipitation estimates can effectively reflect the 
obstructive effect of terrain. The Tianshan Mountains and Altay Mountains block moisture from 
the Atlantic and Arctic Oceans and form precipitation centers on windward mountain slopes, 
resulting in reduced water vapor transport into the southern region of Xinjiang (Zambrano et al., 
2017). The evaluation results showed that CMFD performs best in terms of precipitation detection 
because of its high correlation with "actual" precipitation at different time scales and low data 
bias. This difference is attributed mainly to the fact that the production process is augmented with 
a greater abundance of in situ station-based observational data coupled with the integration of 
surface measurements and remote sensing fusion algorithms, resulting in superior performance 
over Global Land Data Assimilation System (GLDAS) in western China (He et al., 2020). 
Additionally, consistent conclusions are reached in the evaluation of precipitation datasets in the 
Tibetan Plateau region of China, with CMFD product exhibiting greater accuracy at different time 
scales than TMPA, CHIRPS, and ERAS-Land datasets (Wu et al., 2019; Liu et al., 2023). In 
contrast, CMORPH performs the poorest across different time scales, which is evident in winter 
precipitation monitoring, where the occurrence is only 2% and the FAR is as high as 0.90. 
Relevant research has shown that passive microwave precipitation retrieval techniques cannot 
easily separate precipitation radiative signals from ice-covered frozen areas (Behrangi et al., 2014), 
usually assigning zero to missing precipitation data (Mei et al., 2014). Given that a significant 
portion of Xinjiang is covered by ice or snow in winter, CMORPH precipitation estimates are highly 
susceptible to ice and snow cover, and the rain gauge correction algorithm does not seem to 
ameliorate this situation, thus making CMORPH less suitable as an alternative for winter 
precipitation estimates in ice-covered frozen areas (Zhang et al., 2018). Moreover, CMORPH has 
more severe precipitation misrepresentation around inland water bodies, warranting cautious use for 
regions containing large water bodies (Guo et al., 2017). 

The DBMA approach calculates optimal relative weights for each ensemble member based on 
ground-based observational precipitation data; it also creates a predictive PDF for merged 
precipitation through statistical ensemble blending and returns the predictive distribution of 
individual variables as a weighted average of the posterior distribution (Raftery et al., 2005). Each 
member's relative weight score is proportional to its performance during the training time period, 
indicating the uncertainty of individual predictions more strongly. In this study, CMFD and 
ERA5-Land datasets significantly outperform the other satellite-based and reanalysis precipitation 
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datasets and consistently contribute the highest relative weights among the combined members in 
each season (Fig. 6). The likelihood of individual merged members serving as a scale to measure 
the consistency between precipitation estimates and observed precipitation enables the combined 
DBMA data to benefit from well-performing members and obtain higher relative weights 
(Sloughter et al., 2007), which is verified by the accuracy evaluation results of the above 
precipitation datasets in Xinjiang. In addition, the streamflow simulation of the combined 
DBMA-fused precipitation data in the Ebinur Lake Basin captures most of the streamflow peaks 
and is more robust than the IMERG-F precipitation product. The slightly overestimation of 
summer streamflow by DBMA-fused precipitation is partly due to the higher relative weights 
assigned to CMFD and ERA5-Land for fusion precipitation, which generally overestimates actual 
precipitation throughout the year. The other side lies in the reduction in river streamflow due to 
unnatural factors (e.g., increased human withdrawals in summer) (Bao et al., 2022). Thus, it is 
beneficial to find appropriate precipitation data for merging in Xinjiang, where meteorological 
stations are sparsely distributed and located in regions with complex topography and climate 
variability; further research or updating of the existing DBMA framework to include new and 
improved data sources is needed. For example, the use of an HRLT (high-resolution (temporal 
resolution of 1 d and spatial resolution of 1 km) and long-term (1961—2019)) dataset with higher 
spatial resolution is recommended in future related studies (Qin et al., 2022). 


5.2 Uncertainty analysis 


Accurate precipitation estimates are very difficult to obtain in regions with sparse or unmeasured 
meteorological stations. As an alternative approach, satellite-based or reanalysis precipitation 
dataset provides homogeneous precipitation estimates at regional or global scales. In this study, 
by applying DBMA approach, six satellite-based and reanalysis precipitation products at different 
grid scales were fused with adequate error assessment and successfully applied to the arid region 
of Northwest China, i.e., Xinjiang, which has complex topography and changeable climate. 
Nevertheless, the uncertainties inherent in this study still warrant attention. 

(1) The limitation of traditional evaluation methods lies in their ability to assess precipitation 
products only against meteorological stations, with observation errors related to precipitation in 
regions lacking rain gauges remaining unaddressed, especially in the arid regions of Northwest 
China with complex topography and variable climate (Zhang et al., 2018; Hasan et al., 2023). In 
addition, the scale mismatch between gridded estimates and station-based rain gauge 
measurements may be one of the sources of errors in traditional gridded precipitation product 
evaluation methods (Ebrahimi et al., 2017). 

(2) The weight gridding of DBMA approach considers spatial dependence, and the relative 
weight scores of unmeasured regions are obtained by interpolating the individual best weights of 
surrounding stations via the ordinary Kriging interpolation method. Notably, the spatial 
distribution of meteorological stations in Xinjiang is very uneven and sparse, so there is a large 
uncertainty in the interpolation process. Additionally, the choice of interpolation algorithm may 
lead to DBMA-fused precipitation data having different applicability in this region (Hu et al., 
2016). 

(3) The predicted distribution of cumulative precipitation is asymmetric and consists of two 
main parts: a large amount of precipitation is zero, and the distribution of the nonzero component 
follows a gamma distribution (Sloughter et al., 2007). However, the fact that the predicted 
distributions of individuals follow a Gaussian distribution is a necessary condition for using the 
DBMA approach. Ma et al. (2018) applied Box—Cox transformation to nonnormal precipitation 
distribution to estimate normality before applying the DBMA approach, while this study used the 
"discrete-continuous" model to fit the nonnormal distribution of precipitation. The transformation 
and fitting of the predicted cumulative precipitation distribution greatly affect the computation of 
individual weights in DBMA approach. Hence, choosing an appropriate fitting method that aligns 
with the precipitation characteristics within the study area is of the utmost importance. Box—Cox 
transformation of precipitation distribution can be used to study the DBMA-fused precipitation in 
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the future. 

(4) The satellite-based and reanalysis precipitation datasets used for DBMA fusion have 
different spatial resolutions. It is necessary to apply bilinear interpolation to certain precipitation 
data to standardize them to a spatial resolution of 0.1°. According to the theory of error 
propagation, the errors generated in the interpolation process further increase the uncertainty in 
the merging of individual members (Abdollahipour et al., 2021), rising the potential risk to the 
adaptability of fused precipitation in Xinjiang. Therefore, the fusion process is recommended for 
selecting multi-source precipitation data with the same spatial resolution to minimize the use of 
interpolation methods. 


6 Conclusions 


In the present study, we evaluated the applicability of six satellite-based and reanalysis precipitation 
products (CHIRPS, CMFD, CMORPH, PERSIANN-CDR, TMPA, and ERA5-Land) in Xinjiang at 
both temporal and spatial scales. Thus, a general framework for merging precipitation data with 
different spatial resolutions with DBMA approach as the core was proposed. We calculated the daily 
optimal DBMA weights of the members during 1999-2018 and utilized the ordinary Kriging 
interpolation method to spread the station-based relative weights across the entire Xinjiang region; 
the weighted sum of each precipitation product constitutes the DBMA-fused precipitation. The main 
findings of the study are summarized as follows: 

(1) Among the six satellite-based and reanalysis precipitation products, CMFD exhibits the 
highest reliability in capturing the spatial pattern of precipitation and performing a statistical 
analysis of daily precipitation. ERA5-Land performs well in terms of fitting and error analysis of 
winter precipitation but it tends to overestimate precipitation in terms of precipitation centers. 
CMORPH is considered unsuitable as an alternative precipitation dataset in Xinjiang, especially for 
winter precipitation. The accuracy and actual precipitation detection capability of all the 
satellite-based and reanalysis precipitation products are better in summer. 

(2) The average relative weights of CMFD and ERAS5-Land datasets are 0.57 and 0.38, 
respectively. CMORPH has an average relative weight of 0.02, with a peak of 0.08 in late summer 
and early autumn. The remaining precipitation datasets have the average relative weights of 
approximately 0.01. Based on the seasonal distribution of average relative weights alone, CMFD 
has higher relative weights in summer and autumn than in spring and winter, while ERA5-Land 
shows the opposite pattern. 

(3) The performance of DBMA-fused precipitation data evaluated at the independent 
meteorological stations in Xinjiang is generally satisfactory, showing a CC value of 0.67 with the 
ground-based observational precipitation. The improvement of DBMA-fused precipitation over the 
best merged member (CMFD) reaches 7.46% in CC, and its improvement in RMSE reaches 
21.35%. Importantly, DBMA-fused precipitation improves the ability of actual precipitation 
detection events (POD=0.92). 

(4) The advancement of DBMA-fused precipitation data is more obvious than that of the most 
advanced IMERG-F data, with improvements in the performances of different indicators being 
above 19%. The streamflow simulation results in the Ebinur Lake Basin showed that the use of 
DBMaA-fused precipitation as the hydrological driver of the VIC model is the most effective in 
streamflow simulation, and its NSE exceeds that of IMERG-F precipitation dataset. The 
DBMA-fused precipitation can capture most of the streamflow peaks. 

Overall, the proposed dynamic-based BMA precipitation fusion process is feasible for the entire 
Xinjiang region. In the future, it would be beneficial to consider data sources that exhibit better 
consistency with the study area for the fusion of multiple precipitation sources in this region. 
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